Quantitative Methodology (UPF)
Almost always, a ggplot consists of three layers1:
Optionally, we add more layers, such as:
Col: facet_wrap(facets=vars(v), ncol=1)
Row: facet_wrap(facets=vars(v), nrow=1)
elecc19 |>
filter(nombre_de_comunidad == "Cataluña") |>
mutate(erc_per = erc_sobiranistes / total_votantes * 100,
psoe_per = psoe / total_votantes * 100,
hab = if_else(poblacion > 15000, "Ciutat", "Poble")) |>
ggplot(aes(x = erc_per, y = psoe_per, col = hab)) +
geom_point() +
facet_wrap(facets = vars(nombre_de_provincia),
nrow = 1, scales = "free")None: facet_wrap(facets=vars(v)) (CatSalut)
library(lubridate)
covid <- read_csv("data/Dades_di_ries_de_COVID-19_per_comarca.csv")
covid |>
filter(NOM != "Sense especificar") |>
mutate(DATA = as.Date(DATA, format = "%d/%m/%Y"),
MONTH = month(DATA),
YEAR = year(DATA)) |>
group_by(NOM, MONTH, YEAR) |>
summarize(casos = sum(CASOS_CONFIRMAT)) |>
mutate(DATE = as.Date(paste0("01/", MONTH, "/", YEAR), format = "%d/%m/%Y")) |>
ggplot(aes(x = DATE, y = casos)) +
geom_line() +
facet_wrap(facets = vars(NOM), scales = "free")Rows and cols in a grid: facet_grid()
Flip axis: coord_flip()
Change limits: coord_cartesian(ylim = c(30,40))
Easier: ylim(30,40) / xlim(30000,70000)
Scale function:
scale_x_ or scale_y_discrete() or continuous() (and others).Arguments of the function:
breaks: Position of the breaks.labels: Label of the breaks.name: Title of the axis.limits: Limits of the scale.Num: scale_x_continuous()
Cat: scale_x_discrete()
Percentages: Numeric variable from 0 to 1.
elecc19 |>
filter(nombre_de_comunidad == "Cataluña") |>
mutate(erc_per = erc_sobiranistes / total_votantes,
psoe_per = psoe / total_votantes,
hab = if_else(poblacion > 15000, "Ciutat", "Poble")) |>
ggplot(aes(x = erc_per, y = psoe_per, col = hab)) +
geom_point() +
scale_y_continuous(labels = scales::label_percent())Scale function: Brewer.
scale_color_brewer()scale_fill_brewer()Arguments of the function (see help):
type: "seq", "div" or "qual".palette: "Greens", "Set1", "Spectral" (1, 2…).direction: 1 or -1.Scale function: Brewer (see web).
Scale function: Gradient1.
scale_color_gradient()scale_fill_gradient()Arguments of the function:
low: color of the lowest value.high: color of the highest value.Scale function: Gradient.
Scale function: Manual.
scale_color_manual()scale_fill_manual()Arguments of the function:
values: color of each category.labels: name of each category.Scale function: Manual.
elecc19 |>
filter(nombre_de_comunidad == "Cataluña") |>
mutate(erc_per = erc_sobiranistes / total_votantes,
psoe_per = psoe / total_votantes,
hab = if_else(poblacion > 15000, "Ciutat", "Poble")) |>
ggplot(aes(x = erc_per, y = psoe_per, col = hab)) +
geom_point() +
scale_color_manual(values = c("orange", "darkgreen"),
labels = c("Ciutat", "Poble"))Useful websites for colors:
scale_size()scale_shape()scale_alpha()scale_linewidth()labs()
title, subtitle, caption…x, y, col, fill…theme_minimal(), theme_light()…
WSJ, The Economist, Excel… ggthemes.
Very time consuming, at the beginning.
See ggplot2 info.
options(scipen=999) #removes scientific notation
elecc19 |>
filter(nombre_de_comunidad == "Cataluña") |>
transmute(poblacion,
ERC = erc_sobiranistes / total_votantes,
PSC = psoe / total_votantes) |>
pivot_longer(ERC:PSC, names_to = "partit", values_to = "perc") |>
ggplot(aes(x = log10(poblacion), y = perc, col = partit)) +
geom_point(size = 2, alpha = 0.6, show.legend = F) +
facet_wrap(facets = vars(partit),
nrow = 1) +
scale_color_manual(values = c("gold2", "firebrick2")) +
scale_y_continuous(labels = scales::label_percent(), name = "Percentatge de vot") +
scale_x_continuous(name = "Població", breaks = c(2, 3, 4, 5, 6),
labels = c(100, 1000, 10000, 100000, 1000000)) +
labs(title = "Vot a ERC i PSC a Catalunya",
subtitle = "Dades de vot per municipi a les eleccions generals de 2019",
caption = "Font: Ministeri de l'Interior") +
theme(text = element_text(size = 15))Use Cheat Sheet & Manuals:
Quantitative Methodology (UPF)